feat: improve AI agent discoverability#8607
Conversation
Add agent-readiness signals that the static docs build can legitimately serve: - robots.txt: add Content-Signal directive (search/ai-input/ai-train=yes) declaring AI content-usage preferences (contentsignals.org) - customHttp.yml: add RFC 8288 Link headers advertising the API catalog and the existing llms.txt documentation index; set linkset media type - /.well-known/api-catalog: new RFC 9727 linkset pointing agents at the llms.txt / llms-full.txt exports and sitemap (generated at build time, mirroring how robots.txt and sitemap.xml are emitted in postBuildTasks) - add unit tests for the API catalog generator
Add /.well-known/agent-skills/index.json (Agent Skills Discovery RFC v0.2.0) advertising the real amplify-workflow skill from awslabs/agent-plugins. - generate-agent-skills.mjs: build-time generator emitting the index, sourcing name/description from the upstream SKILL.md frontmatter; url points at the agent-plugins docs/marketplace install page, so no sha256 digest is published (the page is discovery/install guidance, not a downloadable artifact) - wire writeAgentSkillsIndex into postBuildTasks (mirrors robots/sitemap/catalog) - customHttp.yml: advertise the index via an additional Link relation - add unit tests for the index generator
Surface the real AWS Knowledge MCP server and make the generated per-page markdown twins discoverable and correctly typed. MCP server card: - generate-wellknown.mjs: emit /.well-known/mcp/server-card.json describing the public, no-auth AWS Knowledge MCP server (https://knowledge-mcp.global.api.aws, HTTP transport, tools) which authoritatively indexes Amplify docs. The card is an honest pointer to AWS's managed server, not a claim that docs.amplify.aws is itself an MCP endpoint. - wire writeMcpServerCard into postBuildTasks; advertise via a Link rel Markdown vending: - customHttp.yml: serve /ai/**/*.md as text/markdown; charset=utf-8 - Layout: inject <link rel="alternate" type="text/markdown"> into each Gen2 content page's <head> for automatic per-page discovery, reusing MarkdownMenu's getMarkdownUrl mapping (now exported) and mirroring its gate (skip gen1/home/ overview pages that have no .md twin) - extend generate-wellknown tests for the server card
The api-catalog linkset entry was missing the required service-desc
relation, so validators could not recognize a machine-readable service
description. Map the relations per RFC 9727:
- service-desc: llms-full.txt (complete machine-readable export)
- service-doc: llms.txt (documentation index)
- service-meta: sitemap.xml
Each relation is now an array of { href, type } objects per Appendix A.
Register WebMCP tools via document.modelContext (with navigator.modelContext fallback) so in-browser AI agents can call the docs site's key read actions: - get_current_page_markdown: returns the current page's generated Markdown twin - get_documentation_index: returns the llms.txt documentation index Both tools are read-only and backed by content the build already produces, so they return real data rather than stubs. The API is feature-detected and the component renders nothing, making it a silent no-op in browsers without WebMCP. Tools are torn down on unmount via an AbortSignal. Mounted from Layout on the same Gen2 content pages that have a Markdown twin.
With trailingSlash: true, Amplify Hosting 301-redirects the extensionless path /.well-known/api-catalog to /.well-known/api-catalog/, which has no file and returns 404 -- so the RFC 9727 catalog was unreachable at its canonical path. Files with an extension are served directly with a 200. - Write the catalog as api-catalog.json (extensioned, served as 200) - Add a 200-rewrite in redirects.json mapping /.well-known/api-catalog to that file so the canonical path resolves in place without a redirect - Set application/linkset+json on both paths in customHttp.yml - Add a test asserting the 200-rewrite contract
osama-rizk
left a comment
There was a problem hiding this comment.
Nice, well-scoped change — solid tests and an honest scope section. No crash-level bugs; inline comments below, most-impactful first. (Checked and NOT flagging: redirects.json — the AJV validator accepts status: "200" as a string and the rule sits ahead of the /<*> catch-all, so it resolves 200 today; only nit is the test asserts existence, not ordering.)
| await fs.writeFile(catalogPath, generateApiCatalog()); | ||
| console.log(`api-catalog written to ${catalogPath}`); | ||
| } catch (error) { | ||
| console.error(`Error writing api-catalog to ${catalogPath}:`, error); |
There was a problem hiding this comment.
Swallowed write error ships a green build that advertises 404s. This catch logs and returns, so a failed write still passes. Meanwhile customHttp.yml emits a global Link header to /.well-known/api-catalog, whose links point at llms-full.txt/llms.txt. If a generator throws, agents follow a Link header to a missing file. writeSitemap/writeRobots do the same, but they aren't advertised in a response header — the blast radius is new here. Consider failing the build on write error (same applies to generate-agent-skills.mjs).
| } | ||
|
|
||
| function getMarkdownUrl(route: string): string { | ||
| export function getMarkdownUrl(route: string): string { |
There was a problem hiding this comment.
getMarkdownUrl doesn't strip the query string. usePathWithoutHash splits on # only, so /react/build-a-backend/auth/?foo=bar → /ai/pages/build-a-backend/auth/?foo=bar.md (404). This PR now routes this function into three consumers (<link rel="alternate">, the WebMcp fetch, and the copy/open menu), so one bad URL propagates everywhere.
| type: 'claude-skill', | ||
| description: | ||
| 'Build and deploy full-stack web and mobile apps with AWS Amplify Gen2 (TypeScript code-first). Covers auth (Cognito), data (AppSync/DynamoDB), storage (S3), functions, APIs, and AI (Amplify AI Kit with Bedrock) across React, Next.js, Vue, Angular, React Native, Flutter, Swift, and Android.', | ||
| url: `${domain}/react/develop-with-ai/agent-plugins/` |
There was a problem hiding this comment.
Skill URL hardcodes /react/ for a platform-agnostic page. The discovery index is global; a docs restructure off /react/ silently publishes a 404 to agents with no error anywhere. Use a platform-neutral/canonical path.
| @@ -170,6 +171,14 @@ export const Layout = ({ | |||
| children?.props?.childPageNodes?.length != 'undefined' && | |||
There was a problem hiding this comment.
isOverview guard is inert. children?.props?.childPageNodes?.length != 'undefined' compares a number to the string "undefined" — always true (meant typeof … !== 'undefined'). It works today only because the > 0 clause carries the whole predicate. Pre-existing, but this PR now depends on isOverview to gate markdownUrl, so it re-exposes it.
| * Fetch a markdown document and return its text, guarding against the SPA | ||
| * fallback returning an HTML page (e.g. a 404) instead of markdown. | ||
| */ | ||
| async function fetchMarkdown(url: string): Promise<string> { |
There was a problem hiding this comment.
fetchMarkdown duplicates MarkdownMenu.handleCopy. Both fetch a /ai/pages/*.md URL and reject the SPA HTML fallback with the identical regex pair (/^\s*<!doctype/i, /^\s*<html/i). Fix the fallback detection in one and the other rots. Extract a shared fetchPageMarkdown next to getMarkdownUrl — you already made that move for getMarkdownUrl.
|
|
||
| dotenv.config({ path: './.env.custom' }); | ||
|
|
||
| const DOMAIN = process.env.SITEMAP_DOMAIN |
There was a problem hiding this comment.
DOMAIN + ROOT_PATH are copy-pasted across three task files (generate-sitemap, generate-wellknown, generate-agent-skills). Change the output dir or default domain and you edit three files in lockstep. Extract a shared tasks/build-constants.mjs.
|
|
||
| const register = async () => { | ||
| try { | ||
| await modelContext.registerTool( |
There was a problem hiding this comment.
Both registerTool calls share one try. If the first rejects (e.g. a transient duplicate-name error during the abort/re-register on fast client-side nav — both names are route-independent), the second tool never registers and the catch swallows it, leaving the page with one or zero tools. Independent trys per tool isolate them.
| # Link headers advertise agent-discovery resources (RFC 8288 / RFC 9727): | ||
| # the API catalog, the agent skills index, the MCP server card, and the | ||
| # LLM-friendly documentation index. | ||
| - key: 'Link' |
There was a problem hiding this comment.
The Link header sits on the global **/* block, so it rides every response (HTML, images, JSON), not just discovery routes — bytes on every request and a wider blast radius for the missing-file cases above. Worth a conscious choice vs. scoping it to the relevant paths.
- getMarkdownUrl: strip query string and hash before building the .md URL, so all three consumers (link rel=alternate, WebMcp, copy menu) get a valid URL for routes with ?query or #hash - generators: rethrow write errors in the api-catalog, MCP server card, and agent-skills writers so a failed write fails the build instead of shipping a green build whose global Link header advertises a missing file - WebMcp: register each tool in its own try so one rejected registration can't block the others; reuse the shared fetchPageMarkdown helper - MarkdownMenu: extract shared fetchPageMarkdown (used by copy menu and WebMcp) so the SPA-HTML fallback guard lives in one place - Layout: fix inert isOverview guard (typeof x !== 'undefined', not x != 'undefined') - tasks: extract shared build-constants.mjs (DOMAIN, ROOT_PATH) and a CANONICAL_PLATFORM constant used for the agent-skills URL - customHttp.yml: document the deliberate choice to keep the Link header on the global block (Amplify patterns are positive-match only; trailingSlash makes pages extensionless, so an html-only pattern would miss real page loads) - tests: query/hash stripping, fetchPageMarkdown, isolated tool registration, and redirect-ordering coverage
bobbor
left a comment
There was a problem hiding this comment.
LGTM.
only smaller nits that are not blocking. we can go forward with this
|
Tick the box to add this pull request to the merge queue (same as
|
Summary
Improves how AI agents and crawlers discover and consume the Amplify docs, implemented entirely within what the static export + Amplify Hosting can serve today. These changes came out of running the site through an "agent readiness" scan and fixing every gap that has a legitimate, non-misleading solution.
All additions point at content or capabilities that actually exist (the generated
llms.txt/markdown exports, the realawslabs/agent-pluginsskill, AWS's public managed MCP server, and read-only browser tools backed by real data) — no stub endpoints or fabricated capabilities.What's included
Discoverability
robots.txt—Content-Signal: search=yes, ai-input=yes, ai-train=yes.Linkresponse headers (customHttp.yml, RFC 8288) advertising the API catalog, agent skills index, MCP server card, and thellms.txtindex.Agent discovery files (generated at build time, alongside
robots.txt/sitemap.xml)/.well-known/api-catalog(RFC 9727 linkset,application/linkset+json) with the required relations mapped to the artifacts the build produces:service-desc→llms-full.txt,service-doc→llms.txt,service-meta→sitemap.xml./.well-known/agent-skills/index.json(Agent Skills Discovery RFC v0.2.0) advertising the realamplify-workflowskill fromawslabs/agent-plugins. Entries point at the docs/marketplace install page, so nosha256is published (the page is install guidance, not a downloadable artifact)./.well-known/mcp/server-card.jsondescribing the public, no-auth AWS Knowledge MCP Server (https://knowledge-mcp.global.api.aws), which authoritatively indexes Amplify docs. An honest pointer to AWS's managed server — the card explicitly states this site does not host its own MCP server.Markdown vending (the per-page
.mdfiles are already generated under/ai/pages/**)<head>emits<link rel="alternate" type="text/markdown" href="/ai/pages/….md">, reusingMarkdownMenu'sgetMarkdownUrlmapping and gate (skips gen1/home/overview pages that have no.mdtwin)./ai/**/*.mdserved astext/markdown; charset=utf-8.WebMCP (in-browser agent tools)
document.modelContext(withnavigator.modelContextfallback) so in-browser AI agents can call the site's key read actions:get_current_page_markdown— returns the current page's generated Markdown twinget_documentation_index— returns thellms.txtindexAbortSignal.Out of scope (intentionally)
Several scanned standards require live services / DNS / viewer-request edge compute that don't exist behind a static public docs site, and publishing files for them would mislead agents that trust them:
/.well-known/oauth-protected-resource) andauth.md— both declare that the site's resources are access-controlled and tell agents how to obtain tokens to reach them.docs.amplify.awsis fully public with no protected API and no agent auth. The scanner passes on the metadata file alone, but asserting a protected resource that doesn't exist could make agents refuse to read public docs or attempt pointless token flows. Deliberately skipped.Accept: text/markdown) — requires reading a request header at the edge. Amplify Hosting rewrites match on path/query only (confirmed byredirects.json's own validator), and the managed CloudFront distribution exposes no viewer-request function hook to this repo. The per-page<link rel="alternate">+text/markdowncontent-type above is the static-friendly equivalent (agents get the markdown at a sibling URL). True same-URL negotiation needs a CloudFront Function on a fronting distribution, owned outside this repo.Testing
service-desc/service-metastructure), MCP server card, agent skills index, and the WebMCP component (no-op without API, tool registration, real fetch on execute).tsc --noEmitclean on changed components.robots.txt,api-catalog,server-card.json, andagent-skills/index.jsonend-to-end.